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Abstract 

PR/SET domain containing 9 (Prdm9) mediates histone modifications such as H3K4me3 and marks hot- 
spots of meiotic recombination. In many mammalian species, the Prdm9 gene is highly polymorphic. 
Prdm9 polymorphism is assumed to play two critical roles in evolution: to diversify the spectrum of 
meiotic recombination hotspots and to cause male hybrid sterility, Ieadingto reproductive isolation and spe- 
ciation. Nevertheless, information about Prdm9 sequences in natural populations is very limited. In this 
study, we conducted a comprehensive population survey on Prdm9 polymorphism in the house mouse, 
Mus musculus. Overall M. musculus Prdm9 displays an extraordinarily high level of polymorphism, particular- 
ly in regions encodingzinc finger repeats, which recognize recombination hotspots. Prdm9 alleles specific to 
various M. musculus subspecies dominate in subspecies territories. Moreover, introgression into other sub- 
species territories was found for highly divergent Prdm9 alleles associated with t-haplotype. The results of 
our phylogeographical analysis suggest that the requirement for hotspot diversity depends on geographical 
range and time span in mouse evolution, and that Prdm9 polymorphism has not been maintained by a simple 
balanced selection in the population of each subspecies. 
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1 . Introduction 

Meiotic recombination enhancesthegeneticdiversity 
in natural populationsand contributes to genome evolu- 
tion. In organisms as diverse as yeasts and mammals, 
meiotic recombination events do not take place at 
random but are clustered at specific genomic regions, re- 
ferred to as recombination hotspots. 1,2 Nevertheless, 
until recently,the molecular basis underlyingdetermin- 
ation of the hotspots has been elusive. 

We previously reported that wm7, a wild mouse-derived 
haplotype of the major histocompatibility complex (MHC) 



on chromosome 1 7, enhances meiotic recombination at a 
hotspot within the MHC. 3 Subsequently, we found that a 
factor genetically linked to this hotspot determines its 
recombination rate, and that the wm7 haplotype carries 
a recombination-enhancing factor. 4 Recently, the factor 
was reported to be a histone methyltransferase, PR/SET 
domain containing 9 (Prdm9). 5-7 Prdm9 mediates 
histone modifications, such as H3K4me3, and is 
thought to mark recombination hotspots. Since its iden- 
tification, many reports have shown that Prdm9 poly- 
morphisms correlate well with site variation in hotspots 
in different mammalian species, including humans, 



©The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ 
licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. 
For commercial re-use, please contactjournals.permissions@oup.com 



316 

ch i m pa nzees, a nd m ice. 8 ~ 1 2 Genome-wide Ch I P a na lysis 
with antibodies that recognize DMC1 and RAD51 per- 
formed for two different Prdm9 alleles in a common 
genetic background revealed that Prdm9 variation can 
account for the site specificity of almost all of the DNA 
double-strand breaks that initiate meiotic recombin- 
ation. 1 1 Thus, Prdm9 appears to be a major trans-acting 
factor for determining the spectrum of hotspots in 
mice and humans, and perhaps in other mammalian 
species as well. 8-1 1 

APrdm9 knockout mouse shows meiosis arrest, indi- 
cating that Prdm9 histone methyltransferase activity is 
involved in progression of meiosis. 1 3 This function has 
also been implicated in reproductive isolation, a 
process that prevents free exchange of genes between 
two genetically divergent populations, leading to speci- 
ation. I n crosses between the mouse subspecies Mus mus- 
culusdomesticus and M. m. musculus,tY\e.t\ male hybrids 
are sometimes sterile. The locus responsible for this 
hybrid sterility was named Hybrid sterility 1 (Hst1), 
and mapped tochromosome 1 7. Recent results revealed 
that Hst 1 is identical to Prdm9, such that Prdm9 became 
the first speciation gene to be reported in mammalian 

14 1 R 

species. ' 

Comprehensive information on Prdm9 polymorphisms 
in natural populations should provide an important 
insight into Prdm9 functions, especially in evolution. Past 
studies have focused on Prdm9 polymorphism in human 
populations. The results revealed thatthe worldwide popu- 
lation is quite diverse, with differences including repeat 
number variations of the zinc finger (ZF) DNA-binding 
repeat and amino acid substitutions in the zinc finger 
array (ZFA) of the Prdm9 C-terminal domain. 7,1 0,1 ^ 7 
Importantly, non-synonymous substitutions preferential- 
ly occur at three amino acids along the a-helixdomain of 
the ZF, which are involved in recognition of the hotspot 
nucleotide motifs. 1 8 Currently, it is thought that hotspot 
diversity in natural human populations can be attributed 
to polymorphism of the Prdm9 ZFA. 19 

There are marked advantages to studying Prdm9 
polymorphism in natural populations of M. musculus. 
First, the phylogenetics of M. musculus is well estab- 
lished. 20,21 Mus musculus is acorn plexspecies, and com- 
prises distinct 'phylogroups' or subspecies. The results of 
extensive phylogenetic analysis of Eurasian wild mice 
revealed that these subspecies diverged roughly 0.5- 
1 .0 million years ago. 20 Their habitats are demarcated 
throughout the Eurasian continent. 21,22 Secondly, 
whereas subspecies of M. musculus are thought to be 
in an early stage of speciation, neighbouring species, in- 
cluding M. spretus, M. macedonicus, and A/I. spicilegus, 
inhabit areas overlapping those of M. musculus. 22 
Thus, M. musculus and its neighbouring species 
provide an ideal model system to study phylogeography 
and speciation. Thus far, mouse Prdm9 polymorphisms 
has been investigated in commonly used laboratory 
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inbred strains and inbred strains derived from wild 
mice. 5,7,23 However, in these studies, sample collection 
was limited, as laboratory inbred strains originate pre- 
dominantlyfrom a single western European subspecies, 
M. m. domesticus. 24,2 5 Moreover, only a limited number 
of inbred strains derived from wild-captured mice were 
included in previous studies. 5,23 

In this study, we extended the population survey to 
wild mice collected in natural populations of M. muscu- 
lus subspecies and neighbouring species, as well as 
inbred strains derived from wild mice. We also investi- 
gated Prdm9 polymorphism in mice with the t-haplo- 
type chromosome variant, which is characterized by 
long inversions on chromosome 1 7 and is linked to 
thePrdm9 locus. Genesonthet-haplotypeare highly di- 
vergent from those on wild-type chromosome 1 7. 26 
The results of our study confirm that Prdm9 poly- 
morphisms are concentrated in ZFA with extensive vari- 
ation of the ZF repeat number and hyper-variation of 
amino acids at the three DNA recognition sites within 
the ZF. Our survey of 79 wild-captured mice and 37 
inbred strains revealed as many as 57 different Prdm9 
alleles in M. musculus. In contrast, some alleles were pre- 
dominant in two subspecies, M. m. domesticus and 
M. m. musculus. The overall phylogeography of mouse 
Prdm9 reflects evolutionary episodes of this species. 
More importantly, Prdm9 alleles that predominate in 
one subspecies are often found in territories of other 
subspecies. Likewise, highly divergent Prdm9 alleles 
associated with the t-haplotype were found to intro- 
gress into all subspecies of M. musculus. 



2. Materials and methods 

2.1. Mice 

Nine mouse strains (M. m. molossinus, MSM/Ms; 
M. m. musculus, NJL/Ms, KJR/Ms, BLG2/Ms, SWN/Ms, 
CHD/Ms; M. m. castcmeus, HMI/Ms; M. m. domesticus, 
PGN2/Ms, BFM/Ms), one Japanese fancy mouse- 
derived strain (M. m. molossinus, JF1 /Ms), and two MHC 
congenic mouse strains (B10.R209, B10D2.TCH/+) 
were maintained at the Genetic Strains Research 
Center, National Institute of Genetics(NIG).Aclassical la- 
boratory mouse strain, C57BL/10Snf, was purchased 
from the Jackson Laboratory and maintained at the 
NIG. The inbred strain SPR2/Rbrc, derived from M. 
spretus, was provided by the RIKEN BioResource Center 
(BRC) through the National BioResource Project, which 
is funded by the Ministry of Education, Culture, Sports, 
Science and Technology (MEXT), Japan. Inbred strains 
used in this study are listed in Supplementary Table S1 . 
All animal experiments were performed in accordance 
with protocols approved by the Animal Care and Use 
Committee of NIG. 
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2.2. Prdm9 cDNA synthesis 

Total testes RNAfrom each mouse strain was isolated 
with Isogen (Nippon Gene). Complementary DNA 
(cDNA) was synthesized using the Primescript RT 
reagent kit (TAKARA) according to the manufacturer's 
instructions. 

2.3. Mouse genomic DNA samples and PCR conditions 
Most genomic DNA samples, including classical inbred 

and wild-captured mice, were prepared by the Genetic 
Strains Research Centeratthe NIG. Some were prepared 
at Hokkaido University. Several genomic DNA samples 
were purchased from the Jackson Laboratory. Genomic 
DNA samples from f-haplotype mice were kindly 
provided by Joe Nadeau or by the RIKEN BRC. All the 
genomic DNA samples are listed in Supplementary 
Table S2. To prevent errors in PCR, we used high-fidelity 
DNA polymerases (KAPA HiFi from Kapa Biosystems or 
KOD Neo FX from Toyobo). In addition, we repeated in- 
dependent amplification at least three times for each 
sample. PCR primer sets and conditions are shown in 
Supplementary Table S3, and the amplified sites are 
shown in Supplementary Fig. S6. 

2.4. Sequence analysis 

To determine the sequences of cDNA, ZFA, high SNP 
region 1 (HSR1) of Prdm9, and intron of T-complex 
protein 1 (Tcp1), we sequenced PCR products directly 
or after subcloning. Forsubcloning, PCR products were 
extracted with a QIAquickGel Extraction Kit (QIAGEN) 
after electrophoresis. Then, the purified PCR products 
were subcloned into pCR-Blunt ll-TOPO (Invitrogen). 
For sequencing, we used the BigDye Terminator v3.1 
Cycle Sequencing Kit (Applied Biosystems) and a 
3130XL DNA Analyser (Applied Biosystems). Primers 
used for sequence analyses are listed in Supplementary 
Table S4. We analysed at least six clones per sample and 
carried out multiple independent experiments for each 
allele. All sequence data from this study were submitted 
tothe DDBJ Sequence Read Archive (Accession numbers 
AB843858toAB8441 16 and AB846828). 

2.5. Coding of ZF repeats 

The Prdm9 ZFA nucleotide sequence was conceptual- 
ly translated (Fig. 1 B), and then the sequence of 28 
amino acids from 51 2Stothe last residue, which corre- 
sponds to the ZFA, was extracted from all ZF repeats. We 
assigned a one-letter box with a given colour to every ZF 
repeat that had a given amino acid triad at the most 
variable positions of the a-helix domain (- 1 , 3, and 
6). When amino acid variation was found for a ZF 
repeat at less variable positions (-5, - 2, and 1 ), the 
right bottom of the box was labelled with the variant 
amino acids. 



2.6. Phylogenetic analysis 

To construct a phylogenetic tree of ZFA, we aligned 
ZFA repeat units using a progressive multiple sequence 
alignment algorithm implemented in ClustalW. 27 
Briefly, each repeat unit was converted to a one-letter 
code according to the amino acid residues at five signa- 
ture sites, i.e. the three most variable and two less vari- 
able among ZF repeats. Mismatch scores between 
repeat units were given by the number of different sig- 
nature sites between two units. The gap open penalty 
was set to 0.5 and the gap extension penalty we used 
was 0.1 . The algorithm aligns ZF repeat unit sequences 
by maximizing the alignment score, as is typical for 
nucleotide and protein sequence alignment. After 
alignment, the repeat unit sequences we re transformed 
into nucleotide sequences and nucleotide distance was 
measured using Kimura's two-parameter method. 28 A 
phylogenetic tree was constructed using the neigh- 
bour-joining method 29 and implemented using the 
MEGA 5 software. 30 A phylogenetic tree of HSR1 and 
Tcp1 was similarly constructed using neighbour-joining 
with Kimura's two-parameter distances. Bootstrap 
resampling tests were conducted with 1 000 iterations. 



3. Results 

3.1 . Polymorphism in the cDNA sequence of Prd m 9 

To determine the overall nature of mouse Prdm9 
polymorphism, we first cloned Prdm9 cDNAs by PCR 
from total RNAs prepared from the testes of 1 2 inbred 
strains (Supplementary Table S1). These comprised 
seven strains derived from wild-captured mice belong- 
ing to different subspecies: two congenic strains har- 
bouring wild-derived Prdm9 alleles, one inbred strain of 
the Japanese fancy mouse (JF1 /Ms), and one inbred 
strain derived from a different species, A/I. spretus. The 
PCR products showed marked variation in size. Foreach 
sample, the major band wascloned and then sequenced 
using a capillary sequencer. 

Comparison between these 1 2 sequences yielded a 
total of 28 nucleotide changes in the 1 565-bp Prdm9 
ORF, excluding the ZFA (Fig. 1). Of these, 19 cause 
amino acid changes. When compared with the C57BL/ 
6 reference sequence (NCBI mm9) and ourC57BL/1 0 
sequence (this study), insertions and deletions (in/ 
dels) without frame shifts were observed in two strains, 
CHD/Ms and B1 0.D2-TCH/+ (Supplementary Fig. S1 ). 
No difference was found in the PR/SET domains of the 
1 2 strains examined. In contrast, the repeat numbers 
of ZF were largely variable among these strains, consist- 
ent with changes in the electrophoretic mobility of the 
PCR products (Supplementary Fig. S2A). In addition, we 
found numerous nucleotide changes in ZFA. Almost all 
of these were associated with amino acid changes. 
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Figure 1. Polymorphisms in Prdm9 cDNA sequences from inbred mouse strains. Amino acid polymorphism of Prdm9 excluding ZFA is 
summarized. Full-length cDNA of Prdm9 was sequenced for 1 2 inbred strains derived from M. musculus and one inbred strain derived 
from a neighbouring species, M. spretus. Nucleotide sequences of the inbred strains were compared with the C57BL/6J reference 
sequence (NCBI mm9). The C57BL/1 0J sequence was identical to the reference. Comparison revealed nucleotide substitutions that lead 
to 1 6 amino acid substitutions in total, as well as two insertions and one deletion of an amino acid relative to the reference. No amino 
acid substitution was observed in the PR/SET domains. Upper arrows indicate positions of amino acid variation. An asterisk (*) indicates 
an interspecific variation (or variation in B1 0.D2-TCH/+). All amino acid variations are listed in the table below the diagram of Prdm9 
protein. The letters S, I, and D in variant types indicate substitution, insertion, and deletion of an amino acid, respectively. 



3.2. ZFA polymorphism of wild-captured mice 

We next extended our survey of Prdm9 ZFA poly- 
morphisms to wild-captured mice. The entire ZFA 
region isencoded by the lastexon. Wedirectlyamplified 
this exon by PCR from genomic DNA from total of 79 
wild mice collected at different locations (Supplementary 
Table S3). The samples include populations belonging 
tothreemajor/vl. mi/scu/wssubspecies and a neighbour- 
ing species. To avoid artificial PCR products, we used 
high-fidelity DNA polymerase and highly stringentcon- 
ditions. Moreover, PCR primer sets were designed for 
regions 200 bp apart from both ends of the ZFA to 
prevent mis-annealing of PCR intermediates. PCR pro- 
ducts were separated on an agarose gel and the major 
band, which displays extensive variation in size among 
samples (Supplementary Fig. S2 B), was subjected to sub- 
cloning. When two major bands existed, the both bands 
weresubcloned,astheyare likely to represent heterozy- 
gous alleles. Reproducibility was confirmed by repeated 
independent PCR amplifications. More than six clones 
were sequenced for each sample with two different 
primersets. If two types ofZFA sequences were reprodu- 
cibly obtained, they were judged as heterozygous alleles. 

We aligned all ZF repeats identified in this study 
(Fig. 2A). The first ZF repeat appears to be uniform, 



but the internal ZF repeats were highly variable. In par- 
ticular, DNA recognition positions -1,3, and 6 were 
highly variable (Fig. 2A), consistent with a previous 
report. 17 A lower level of variation was found at posi- 
tions, -5, -2, and 1. The last repeat tends to be less 
variable than the internal repeats. In total, we identified 
36 unique ZF repeats in ZFA. Of these, 24 are newly iden- 
tified in this study (Fig. 2A). 

We further compared variation in the mouse ZF 
repeats with those in human PRDM9, as obtained 
from a public database (http://www.ncbi.nlm.nih.gov/ 
nuccore) (Fig. 2B). Amino acids in the backbone of 
C 2 H 2 -class ZFs are well conserved between the two 
species; however, some species-specific amino acid var- 
iations are found. For example, the amino acid at pos- 
ition -2 of the a-helix domain is mostly tyrosine (T) 
in the mouse, but serine (S) or arginine (R) in 
humans. The amino acid at position 5 is mostly isoleu- 
cine (I) in mouse, but exclusively leucine (L) in 
human. These positions reside in the a-helix domain, 
close to the most variable DNA recognition positions, 
i.e. positions -1 and 6. Amino acid variation is also 
found within each species. It is conceivable that this 
variation results in changes to DNA recognition pat- 
terns. 
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Figure 2. Amino acid variation \nPrdm9 ZF repeats. (A) Multiple alignment of ZF repeat sequences from inbred strains and wild-captured mice. 
The alignments were made separately for the first repeat, the internal repeats of theZFA, and the last repeat, as their degrees of variation are 
different (see text). The a-helix domain of the ZF is shown at top of the internal repeat. The one-letter code is shown to the left of the repeats. 
A novel repeat found in this study is indicated by # to the right of the repeat. Ami no acid variation at specific positions a long a I ignedZF repeats 
are shown in colour. The star on the leftside in the first repeat corresponds to those shown in Fig. 3. (B) Multiple alignment of ZF repeats of 
human PRDM9 based on the publically available data (http://www.ncbi.nIm.nih.gov/nuccore). The format of the alignment is the same as 
that used for mouse Prdm9. 



3.3. Phylogeography of polymorphisms in theZFA of rep eat depending on the amino acid triads present at 

Prdm9 in natural populations the most variable three positions, -1,3, and 6, as well 

To simplify annotation of Prdm9 ZFAs in subsequent as additional alterations (Fig. 2A) (see Materials and 

analyses, we used a one-letter code to identify each ZF Methods for detail). Figure 3 presents an alignment of 
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all ZFA diagrams of wild-captured mice, inbred strains 
derived from wild mice, and commonly used laboratory 
inbred strains. The data indicate that wild-captured 
mice have ZFAs of various lengths. The ZF repeat 
numbers were between 9 and 1 6. We could identify 
as many as 57 ZFA variants within a single species, M. 
musculus. In addition, we found that neighbouring 
species of M. musculus have ZFA variant types different 
from those of M. musculus. 

To elucidate the phylogenetic relationships among 
ZFA variants in M. musculus, multiple alignment of the 
ZFA was performed using the one-letter code. After 
alignment, the phylogenetic tree was constructed 
using the neighbour-joining method. 29 This phylogen- 
etic analysis revealed that the ZFA variants could be 
divided into five major groups (Fig. 4A). The localities 
where the wild mice were collected are then plotted 
on the territoriesof three majorsubspeciesofM muscu- 
lus (Fig. 4B). 



Group 1 exclusively includes mice collected intheter- 
ritory of subspecies M. m. musculus (hereafter, MUS), 
which extends on the North Eurasian continent from 
Eastern Europe to the Far East. Groups 2, 4, and 5 
mainly include mice collected in the territory of subspe- 
cies M. m. castaneus (CAS), which includes a wide range, 
from Southwestern Asia to India, and extends to 
Southeast Asia, South China, Indonesia, and the south- 
ern and northern edges of Japanese islands. Group 3 
includes mice collected in the territory of subspecies 
M. m. domesticus (DOM), which extends from western 
and southern Europe to the Middle East and the 
shores around the Mediterranean Sea. Group 3 terri- 
tory also includes the New World, North and South 
America, coincident with human migration. This 
group also includes commonly used classical inbred 
strains, which are overwhelmingly derived from the 
DOM lineage. 31 Groups 2 and 5 also include a small 
number of mice collected in the MUS and DOM 
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Figure 3. ZFA alignment for inbred and wild-captured mice. We first assigned a one-letter coding system to ZFAs (see text), then aligned theZFA 
diagrams based on the sequences of classical laboratory strains, wild-derived inbred strains, wild-captured mice, neighbouring species of M. 
m uscu/us, a ndt-haplotype bearing mice. Using this code, we easily classified all variant types of ZFA. ZFA boxes are aligned from the N-terminal 
(left of the diagram) to the C-terminal end of the protein (right of the diagram). The diagrams are categorized into DOM- and CAS-related 
ZFA variant types (left block), MUS-related ZFA variant type (centre block), and other ZFA types of neighbouring species and t-haplotype 
(right block). The identification (ID) code for each ZFA diagram is indicated onto the left of the diagram. The names of inbred strains and 
taxonomy are indicated to the left of the ID codes. 



No. 3] 



H. Kono eta/. 



321 



A 



BLG2/MS. CHD/Ms 



MM 



- Mb1 

- Mb3 

-Ma9 

- Ma4 
lMa19 

-Ma8 
-Ma2 

LMa16 KJR/Ms 
Ma3 

( ,Ma5 MSM/Ms, JF1/MS 

- Ma6 PWD/PhJ. NJUMs 




Group-1: MUS 



Group-2: CAS 



,Ca1 

-I— Ca2 HMI/Ms, R209-wm7 
Cc1 



Da1 C3H/HeJ. WSB/EU, SK/Cam 
Da2 C57BL/10J. DBA2/J, A/J 
DM 

Da3 
Da5 



Db5 AVZ/Ms 
Db4 
Db1 
Db2 



| Group-4: CAS 

Cc3 



Group-3: DOM 



-Ce2 



-Ce5 | Unknown 



-Ce1 

r- Del BFM/2MS 
H Dc2 PGN2/MS 



Group-5: CAS 



. f-haplotype B10 D2-TCH/+, r, I 



f-haplotype 




Figure 4. Phylogeny and phylogeography of PrdmS ZFAs. (A) Phylogenic tree constructed from 57 PrdmS ZFAs.The identification codes of ZFA 
variant types are labelled according to the ZFA diagrams (Fig. 3). The 57 ZFAva riant types can be divided into five major groups. The names of 
inbred strains carrying a given ZFA variant type are shown. (B) Points of collection of wild-captured mice are shown on a world map. 
Background colours indicate the ranges (territories) of each M, musculus subspecies as follows. Blue, DOM; green, MUS; purple, CAS. The 
red line in Europe indicates a DOM-MUS hybrid zone, where the two subspecies come into contact. Dashed lines indicate borders 
between other subspecies. For some species, the border regions are not clear. Japanese wild mice, M. m. molossinus, are a hybrid of two 
subspecies, MUS and CAS. 
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territories, respectively. Likewise, Group 3 includes one 
mouse collected in the CAS territory. In the Japanese 
population, 12 of 13 mice (including two inbred 
strains) showed only a single ZFA variant type (Ma5), 
classified as Group 1. The remaining strain has the 
variant type of Group 4 (Ca2). 

Altogether, although M. musculus as a whole holds 
huge variations in ZFA, the mice collected in the MUS 
and DOM territories exhibit mutually exclusive ZFA 
variant patterns (Group 1 for MUS and Group 3 for 
DOM, respectively). The existence of such predominant 
varianttypes is notseen in thecaseofthe mice collected 
in the CAS territory. In this region, ZFA variant types can 
be further divided into at least the three major groups. 

We calculated the frequency of wild-captured mice 
heterozygous for ZFA variant types in different territor- 
ies of subspecies (Supplementary Fig. S3). The value 
for the total M. musculus population is 3 5% (26/74); 
for DOM, 10% (2/20); for MUS, 44% (11/25); for 
CAS, 50% (12/24); and for Japanese wild mice 
(A/I. m. molossinus), 10% (1/10). These results are 
consistent with the degree of ZFA diversity observed in 
the three subspecies populations. 

3.4. Phytogeny o/Prdm9 ZFA in t-haplotype mice 

B10.D2-TCH/+ is a heterozygous MHC haplotype 
mouse stock that harbours a recessive lethal mutation. 
In this stock, the wild-derived MHC haplotype is trans- 
mitted to the next generation at a highly distorted 
ratio (unpublished data). Therefore, we inferred that 
B10.D2-TCH/+ has t-haplotype (Supplementary Fig. 
S4). Prdm9 cDNA from this strain did not have poly- 
morphisms observed for other strains in the PR/SET 
domain; however, unique substitutions were found 
outside the ZFA, as for M. spretus, in addition to a 
DOM-type sequence. The result suggests that this 
stock is heterozygous for DOM-type and more diver- 
gent Prdm9 alleles (Fig. 1). Its ZFA has 1 1 repeats of ZF 
with a rare type of the amino acid triad. To analyse 
other strains carrying the t-haplotype, we amplified 
the ZFA from seven t-haplotype samples, t wS /+, 

r 7, /+, r 7S /+, t 12 /+, t wl2 /+, t°/+, and t w2 i r 2 . 

With the exception of for t w2 ', all samples were hetero- 
zygous for a DOM-type chromosome. Sequencing of 
ZFA from all strains showed a rare variant type in add- 
ition to a DOM-type ZFA. Notably, the sequence of the 
rare variant is identical among samples carrying the 
f-/7aplotype and B1 0.D2-TCH/+. Phylogenetic analysis 
showed that the ZFA sequence associated with the 
t-haplotype is more divergent from those of the five 
major groups in M. musculus (Figs 3 and 4A). 

We compared Prdm9 genome sequences among 
three inbred strains, C57BL/6J, CAST/EiJ, and PWK/ 
PhJ, which are derived from three subspecies, DOM, 
CAS, and MUS, respectively. We found nucleotide 



sequence polymorphism in a 750-bp intronic region 
of Prdm9 residing in the interval between two exons 
that encode the PR/SET domain (Supplementary Fig. 
S5). We named this HSR1 (Fig. 5A). To clarify the phyl- 
ogeny of regions outside the Prdm9 ZFA, we sequenced 
HSR1 from different subspecies of wild-captured M. 
musculus and neighbouring species. We also sequenced 
a 2.38-kbregionthatincludesintrons8-1 1 ofTcpl ,at- 
haplotype marker, 26 to analyse the phylogeny of a gene 
linked to Prdm9. We constructed phylogenetic trees 
from these two sequences, HSR1 and Tcp1 , using the 
neighbour-joining method. 29 For tree construction, 
M. caroli, which is more divergent from M. musculus 
than other neighbouring species, 20 was used as an out- 
group. The overall topology of the Tcp1 tree is similar to 
the tree based on Prdm9 ZFA (Fig. 5C). In particular, 
Tcp1 in the t-haplotype has diverged from those in 
the wild-type chromosome, consistent with a previous 
report. 26 However, the sequence is similar to that 
found in some mice from the CAS territory. In contrast, 
the phylogenetic tree of the HSR1 sequence shows that 
Prdm9 in the t-haplotype issimilartothatfound in mice 
from the CAS and DOM territories, but distant from 
those of mice from the MUS territory (Fig. 5B). These 
results indicate that the single Prdm9 allele in t-haplo- 
type introgressed into all subspecies of M. musculus, 
and suggest that intragenic recombination somewhere 
in the interval between ZFA and HSR1 in the Prdm9 
gene occurred in the past. 

4. Discussion 

The results of this study support that the mouse 
Prdm9 polymorphism is highly polymorphic. Indeed, 
the degree of the polymorphism of Prdm9 is compar- 
able with that of MHC, which may be maintained by se- 
lective advantage. 32 Prdm9 polymorphisms converged 
in the ZF repeats and in amino acid positions at - 1 , 3, 
and 6 of the a-helix domain, which are extremely vari- 
able and are thought to recognize DNA sequences. 
A lower level of variation was found for amino acid posi- 
tions -2 and 1 in both mouse and human ZF repeats. 
We infer that all these positions have been subjected 
to positive selection to increase the diversity of Prdm9 
polymorphism, 1 7 as non-synonymous substitutions 
preferentially occur at positions -2 and 1, both in mice 
and in humans. In humans, 1 9 unique ZF repeats have 
been identified by extensive analysis of genome 
sequences from various human populations. 5,7,1 0,1 7,33 
This number is lower than that in mouse (Fig. 2B). It is 
likely explained by the shorter time of divergence 
between different human populations, in contrast to a 
longer period of divergence between different mouse 
subspecies (i.e. 0.5-1 .0 million years). 20 

The first and last ZF repeats showed a lower level of 
variation when compared with internal repeats. In the 
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Figure 5. Phylogeny of the Prdm9 intron ic sequence (HSR1 ) and Tcp1 . (A) Upper diagram, ma p of the chromosomal region containingTcp/ and 
Prdm9. Lower diagram, view of the exon- intron organization of Prdm9. HSR1 is located in an intron between two exons that encode the PR/ 
SETdomain. (B) Phylogenetic tree of Prdm9 HSR1 . (C) Phylogenetictree of Tcp 7 sequence in introns 8-1 1 M. caroli was used astheoutgroup 
for both trees. An asterisk (*) to the right of a strain name (ZFA variant) indicates that the samples are heterozygous for Prdm9 HSR1 (B) and 
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first repeat, the first cysteine, which is involved in tertiary 
structure of the C 2 H 2 -class ZF, has been lost. As a conse- 
quence, it may not function as authentic ZF (Fig. 2A and 
B). For the last repeat, the C 2 H 2 -classZF is conserved and 
amino acids at positions 3 and 6 are variable in mice 
(Fig. 2A). These positions show a higher ratio of Ka/Ks 
value (4:1), suggesting that they participate in DNA 
recognition. 

The most prominent feature of ZFA polymorphism is 
repeat number variation. In mouse populations, the 
minimum number of repeats is 9, including the first 
and last repeats (Fig. 3). This number was found in 
three mouse subspecies. Longer repeats (1 5 and 1 6) 
are enriched in the MUS territory, although some MUS 
mice have shorter repeats. In the MUS territory, al- 
though mice inhabiting the border with CAS territory 
carrysuch shorter repeats,theyshareZFAcharacteristic 
of the MUS type. 



The results of our phylogeograpical study clearly 
show that M. musculus as a species holds extensive 
Prdm9 polymorphism, owing to large degrees of vari- 
ation in the ZFA. Within a subspecies lineage, the ZFA 
variant types tend to be similar, with the exception of 
the CAS lineage. Even though MUS territory extends 
long distances, reaching from the Northern Eurasian 
continent to Eastern Europe and the Far East, most 
mice collected in this range are exclusively included in 
Group 1 (Fig. 4A and B). The Japanese population, 
M. m. molossinus, is a hybrid of two subspecies, MUS 
and CAS, but its genome is overwhelmingly derived 
from MUS. 34 We found that the majority of mice col- 
lected in different localities in Japanese islands have a 
single ZFA variant type (Ma5) of Group 1. Another 
variant type (Ca2) in Group 2 (CAS) is carried by one 
mouse sample with the wm7 MHC haplotype, which 
was first used for identification of Prdm9 as the 
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hotspot determinant. 7 Thus, the present result sup- 
ports a hybrid origin of M. m. molossinus. 34 ' 36 

Recent studies suggested that Southwest Asia and North 
India are the likely places of origin of M. musculus. 37 ' 39 
In this region, three major subspecies lineages, DOM, 
MUS, and CAS, likely separated from one another in 
subdivided regions and diverged over a relatively long 
evolutionary time period of 0.5-1.0 million years. 
Subsequently, the three lineages dispersed to their 
present ranges, probably associated with agricultural 
dispersal by humans. 22,37-39 The latter event is esti- 
mated to have occurred relatively recently, 10-20 
thousand years ago. 22,40,41 Eastward MUS lineages 
from the origin of M. musculus may have reached to 
the Far East, then migrated to Japanese islands 2-3 
thousand years ago through the Korean peninsula as 
stowaways during the transportation of rice, following 
preceding CAS migration from Southeast Asia, which 
might have occurred 5-10 thousand years ago. 22 If 
these evolutionary episodes of M. musculus are correct, 
then it appears that a large degree of Prdm9 diversity is 
not always required for survival in natural populations. 
A single version of the hotspot repertoire has been suffi- 
cienttomaintainthejapanese population inatimespan 
of at least 2-3 thousand years. Likewise, local popula- 
tions in MUS and DOM territories have survived for 
10-20 thousand years with a limited polymorphism 
at Prdm9. Regarding the CAS lineage, the high hetero- 
geneity of ZFA in these mice is consistent with data sug- 
gesting that CAS consists of multiple sublineages, which 
show a relatively large degree of genome divergence 
from one another. Given that the CAS territory 
incl udesthe likely place of origin of M. musculus 4 ^ its het- 
erogeneity is reminiscent of the African human popula- 
tion, which shows greater diversity of Prdm9 ZFA. 9 Thus, 
overall, the phylogeographical features of Prdm9 poly- 
morphism in M. muscw/wsaresimilartowhat has been in- 
ferred from information about manyothergenes. 21,22,42 
Furthermore, we infer that the requirement for hotspot 
diversity depends on geographical range and time span 
in evolution, and that Prdm9 polymorphism has not 
been maintained by a simple balanced selection in the 
population of each subspecies. 

The results of this study show that subspecies-specific 
Prdm9 variant groups prevail in the demarcated terri- 
tories of M. musculus subspecies; however, the data also 
revealed intermingled variant groups in areas where the 
territory of one subspecies borders another (Fig. 4B). 
Moreover, except for Group 1 , each of the major ZFA 
variant groups contains small numbers of mice collected 
in the territory of other subspecies. A more prominent 
example of intermingled Prdm9 alleles was observed in 
the case of t-haplotype. 

Mouse t-haplotype harbours three inversions in 
chromosome 1 7 . 43-45 The nucleotide sequences of 
f-haplotype associated genes are largely diverged from 



those of the wild-type chromosome of M. musculus 44 ' 46 
At present, in mouse natural populations, t-haplotype is 
observed in all subspecies of M. musculus at 1 0-40% 
of frequency. 47,48 It is inferred that t-haplotype has 
been introgressed into all subspecies of M. musculus 
10-20 thousand years ago as a single event, and 
expanded to a II subspecies during agricultural dispersal, 
due to high distortion of the transmission ratio of t- 
haplotype relative to wild-type chromosome. 44,46 This 
study clearly shows that t-haplotype mice have unique 
and characteristic Prdm9 ZFA and intronic (HSR1) 
sequences. These data support the idea that the t- 
haplotype is of monophyleticorigin. In our phylogenet- 
ic trees, Tcp1 andPrdm9 HSR1 int-haplotypeappearin 
thesamecladeasthosefrom micecollected intheCAS 
territory. This suggests that t-haplotype originated in a 
sublineage of the CAS subspecies and rapidly intro- 
gressed into all subspecies of M. musculus. 

Our phylogenetic analysis also revealed that the top- 
ology of the ZFA tree differs from that of the HSR1 tree. 
This implies that intragenic recombination occurred in 
the interval between these two regions, despite the fact 
that they are only 9 kb apart. If ZFA contains a recom- 
bination hotspot, an associated frequent and unequal 
recombination might give rise to repeat number vari- 
ation of ZFs. 
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